Please complete this R-markdown document with your group by answering the questions in intuit-quickbooks.pdf on Dropbox (week6/readings/). Create an HTML file with all your results and comments and push both the Rmarkdown and HTML file to GitLab when your team is done. All results MUST be reproducible (i.e., the TA and I must be able to recreate the HTML from the Rmarkdown file without changes or errors). This means that you should NOT use any R-packages that are not part of the rsm-msba-spark docker container.
This is the first group assignment for MGTA 455 and you will be using git and GitLab. If two people edit the same file at the same time you could get what is called a “merge conflict”. git will not decide for you who’s change to accept so the team-lead will have to determine which edits to use. To avoid merge conflicts, always click “pull” in Rstudio before you start working on a files. Then, when you are done, save and commit your changes, and then push them to GitLab. Make this a habit!
If multiple people are going to work on the assignment at the same time I recommend you work on different files. You can use source to include R-code in your Rmarkdown document or include other R(markdown) documents into the main assignment file.
Group work-flow tips as discussed during ICT in Summer II are shown below:
source command to bring different pieces of code together into an Rmarkdown document or into an R-code fileA graphical depiction of the group work-flow is shown below:
Additional resource on the use of git are linked below:
We can reach the following conclusions based on the EDA output:
The customers in zip_bins = 1 have obviously higher response rate than other zip bin customers.
The variables “sex” and “bizflag” have no significant effect on “res1”.
Based on the “Recency, Frequency and Monetary” framework, we are more interested in how recent a customer made the purchase other than how early he become made the first order; we hence pick “last” over “sincepurch” to be used in the model.
Also based on the “Recency, Frequency and Monetary” framework, we are more concerned about how frequent a customer put orders with the company than how much money he has spent; besides, there’s a strong relationship between the two features as usually the more frequent customers buy the more money customers spent in total. Therefore, we think it is only necessary to use one of them - “numords” - in our model.
Based on the finding that customers in zip_bines = 1 have obviously higher response rate than other zip bin customers, we think it worth trying to give more weight to customers in this zip bin when predicting purchase probability. A new variable “zip_one” is created to enable the model to do so.
id zip zip_bins zip_one
1 1 94553 18 FALSE
2 2 53190 10 FALSE
3 3 37091 8 FALSE
4 4 02125 1 TRUE
5 5 60201 11 FALSE
6 6 12309 3 FALSE
We then further investigated to see why bin1 has such high response rate. Breaking-down the actual zip code, we try to uncover the truth by exploring response rate by state(represented by first 3-digit of zip). Finally, we find that Virginia whose zip-code start with “008” has 1891 responses with response rate of 0.398, much higher than the other states. Thus, we create another new variable “VI”.
id zip zip_bins VI
1 1 94553 18 FALSE
2 2 53190 10 FALSE
3 3 37091 8 FALSE
4 4 02125 1 FALSE
5 5 60201 11 FALSE
6 6 12309 3 FALSE
Based one the cost and margin given, we use the break-even rate of 0.024 as the cut-off to lable whether Intuit should mail to in the second round or not.
We indexing the rfm-id to every cutsomer in the dataset, and split the data into training and validation set. Using break-even as cut-off on the training set, we filtered out the profitable rfm-id to mail to in the validation set.
partial RFM-id
[1] "142" "354" "354" "154" "224" "232" "453" "452" "311" "124"
Training
Based on our analysis, the number of customers Intuit should mail is 42,842 that is 81.60% of the customers. The response rate for the selected customers is predicted to be 5.40%, or, 2,312 buyers; while the actual response rate is 4.76%, or, 2,498. The predicted margin is $138,720.00; while actual margin is $149,880.00. The expected profit is $78,313. The messaging cost is estimated to be $60,407 with a ROME of 1.30.
Validation
Based on our analysis, the number of customers Intuit should mail is 18,540 that is 82.40% of the customers. The response rate for the selected customers is predicted to be 5.36%, or, 994 buyers; while the actual response rate is 4.90%, or, 1,103. The predicted margin is $59,640.00; while actual margin is $66,180.00. The expected profit is $33,499. The messaging cost is estimated to be $26,141 with a ROME of 1.28.
Uncertain about whether to choose “zip_one”, which weighs more on all customers in zip_bin 1, or “VI”, which weighs more on only customer in the state of Virginia area, we decided to build 2 models for each variable then pick the one that contributes more profit in the validation set.
For both models, we re-estimate them 100 times, each time with a different bootstrap sample of the data; then calculate the 5th percentile of the predictions to use as the lower bound on the estimated probability.
Logistic regression (GLM)
Data : train
Response variable : res1
Level : Yes in res1
Explanatory variables: zip_one, numords, last, version1, owntaxprod, upgraded
Null hyp.: there is no effect of x on res1
Alt. hyp.: there is an effect of x on res1
OR coefficient std.error z.value p.value
(Intercept) -3.712 0.063 -58.542 < .001 ***
zip_one|TRUE 7.526 2.018 0.055 36.799 < .001 ***
numords 1.313 0.272 0.016 17.451 < .001 ***
last 0.957 -0.044 0.002 -18.144 < .001 ***
version1 2.178 0.778 0.053 14.791 < .001 ***
owntaxprod 1.361 0.308 0.103 3.005 0.003 **
upgraded 2.698 0.993 0.051 19.591 < .001 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Pseudo R-squared: 0.112
Log-likelihood: -8923.029, AIC: 17860.057, BIC: 17922.137
Chi-squared: 2243.587 df(6), p.value < .001
Nr obs: 52,500
id pred_logit1 pred_logit2 pred_logit3 pred_logit4 pred_logit5 pred_logit6 pred_logit7 pred_logit8 pred_logit9 pred_logit10 pred_logit11 pred_logit12 pred_logit13 pred_logit14 pred_logit15 pred_logit16 pred_logit17 pred_logit18 pred_logit19
1 1 0.03358735 0.03343499 0.03200460 0.03571395 0.03151038 0.03298835 0.03267354 0.03258440 0.03173267 0.03092486 0.03173799 0.03362969 0.03175636 0.03213912 0.03261412 0.03124107 0.03276036 0.03309290 0.03242072
2 4 0.09892730 0.09685813 0.10862200 0.11227474 0.10475323 0.10236350 0.10103218 0.09245060 0.11279121 0.10408217 0.09943008 0.10620199 0.10963440 0.10265061 0.09649799 0.09900911 0.10601267 0.10419287 0.10721082
3 6 0.03944517 0.03836045 0.03831333 0.04089123 0.03812294 0.03873628 0.03815849 0.03831863 0.04279523 0.03794184 0.03823897 0.04114587 0.03822194 0.03959844 0.03716085 0.03851944 0.04050187 0.03772068 0.03990839
4 8 0.05752119 0.05490139 0.05555257 0.05592480 0.05739641 0.05541309 0.05482406 0.05468170 0.05163860 0.05332810 0.05719358 0.05203597 0.05176091 0.05409443 0.05514705 0.05645480 0.05796241 0.05816801 0.05130621
5 10 0.03400503 0.03539782 0.03376988 0.03681382 0.03326006 0.03387209 0.03340510 0.03498150 0.03357375 0.03353869 0.03341905 0.03500160 0.03391941 0.03492884 0.03418705 0.03291307 0.03489916 0.03415724 0.03479427
6 11 0.11767530 0.09495415 0.09775960 0.10044154 0.09770223 0.08749662 0.11636045 0.09904646 0.09899641 0.10612800 0.10252201 0.08700145 0.09749312 0.10693396 0.10840201 0.10533753 0.11882828 0.10940311 0.10143081
pred_logit20 pred_logit21 pred_logit22 pred_logit23 pred_logit24 pred_logit25 pred_logit26 pred_logit27 pred_logit28 pred_logit29 pred_logit30 pred_logit31 pred_logit32 pred_logit33 pred_logit34 pred_logit35 pred_logit36 pred_logit37 pred_logit38
1 0.03360169 0.03342255 0.03017989 0.03225833 0.03376076 0.03358016 0.03312976 0.03456998 0.03331778 0.03214572 0.03115417 0.03171066 0.03321784 0.03162812 0.03172261 0.03166638 0.03174887 0.03312649 0.03498406
2 0.09618242 0.09784495 0.10134082 0.10915971 0.10473739 0.09870573 0.09950667 0.09925290 0.10060007 0.09781975 0.09139115 0.08960184 0.10410763 0.09949996 0.10561621 0.08909057 0.09444601 0.10270552 0.10900708
3 0.04203303 0.03955244 0.03802470 0.03756124 0.04124930 0.03851684 0.03807014 0.03905901 0.03613596 0.03909482 0.04036432 0.04170117 0.03901813 0.04061042 0.03857037 0.03782003 0.03865271 0.03845109 0.04085207
4 0.06009205 0.05948448 0.05113757 0.05381577 0.06168292 0.05730441 0.05571241 0.05845852 0.05850964 0.05607299 0.05330788 0.05508270 0.05453207 0.05293310 0.04980663 0.05526426 0.05445610 0.06042289 0.05847857
5 0.03482681 0.03481708 0.03243509 0.03384065 0.03417885 0.03497442 0.03477175 0.03600471 0.03555958 0.03342703 0.03246905 0.03344232 0.03527607 0.03389408 0.03404260 0.03377416 0.03350637 0.03387732 0.03608627
6 0.10076778 0.11676914 0.11844525 0.11392090 0.11193128 0.11210072 0.10415923 0.09419045 0.09962881 0.10407987 0.10037762 0.11050403 0.10087167 0.11183555 0.09161331 0.10828906 0.10427878 0.09351420 0.10292671
pred_logit39 pred_logit40 pred_logit41 pred_logit42 pred_logit43 pred_logit44 pred_logit45 pred_logit46 pred_logit47 pred_logit48 pred_logit49 pred_logit50 pred_logit51 pred_logit52 pred_logit53 pred_logit54 pred_logit55 pred_logit56 pred_logit57
1 0.03192181 0.03111434 0.03307356 0.03121448 0.03179671 0.03134725 0.03275114 0.03214496 0.03193610 0.03026263 0.03437335 0.03203878 0.03344493 0.03190571 0.03195365 0.03263580 0.03327844 0.03134953 0.03476713
2 0.10514184 0.10770448 0.09625668 0.10311312 0.10350101 0.09829631 0.10380479 0.09687180 0.10422864 0.09023249 0.09558007 0.09774254 0.10964365 0.10500497 0.10540425 0.10109993 0.10157117 0.11143560 0.10437592
3 0.03785396 0.03834380 0.03650917 0.03930713 0.03771426 0.03764433 0.04191299 0.03863792 0.03640732 0.03711005 0.03788961 0.03825732 0.04117879 0.03735159 0.04136052 0.03950804 0.03939973 0.03950037 0.03921417
4 0.05673117 0.05654252 0.05909550 0.05204238 0.05010853 0.05555217 0.05652313 0.05400452 0.05527998 0.05011076 0.05578579 0.05822234 0.05728011 0.05515525 0.05084441 0.05429963 0.05733588 0.05064022 0.05482031
5 0.03333347 0.03223780 0.03450158 0.03345001 0.03502497 0.03305271 0.03414026 0.03527874 0.03387113 0.03331194 0.03722145 0.03418383 0.03467863 0.03339249 0.03430114 0.03465831 0.03470626 0.03411437 0.03781715
6 0.10697371 0.11737986 0.11239050 0.09528845 0.09957441 0.11972951 0.09095333 0.10008211 0.09845680 0.10137152 0.08792852 0.09538499 0.12429862 0.08492451 0.09174744 0.10554804 0.10831872 0.09282312 0.09567182
pred_logit58 pred_logit59 pred_logit60 pred_logit61 pred_logit62 pred_logit63 pred_logit64 pred_logit65 pred_logit66 pred_logit67 pred_logit68 pred_logit69 pred_logit70 pred_logit71 pred_logit72 pred_logit73 pred_logit74 pred_logit75 pred_logit76
1 0.03340193 0.03271358 0.03331971 0.03003253 0.03207013 0.03406553 0.03334371 0.03353190 0.03321215 0.03443992 0.03275002 0.03316669 0.03293044 0.03186004 0.03234653 0.03328785 0.03230649 0.03418716 0.03390423
2 0.10235657 0.09712234 0.10077913 0.09619979 0.10457113 0.10666186 0.10495059 0.10897592 0.10836144 0.10004217 0.09766725 0.10791776 0.10665301 0.09411241 0.09905749 0.10297051 0.09777549 0.10998732 0.10120357
3 0.04276788 0.03766560 0.04329743 0.03860128 0.03870533 0.04209403 0.04001882 0.04052196 0.04162548 0.03434634 0.04433010 0.03919750 0.03749206 0.03735436 0.03931006 0.03781060 0.04048488 0.03952046 0.04087243
4 0.05394494 0.05336793 0.05528443 0.05242703 0.05276817 0.05758850 0.05765777 0.05554282 0.06074249 0.05529036 0.05589514 0.05340643 0.05559222 0.05982647 0.05453115 0.05586996 0.05845316 0.05720915 0.05527906
5 0.03469197 0.03501473 0.03450698 0.03265618 0.03462493 0.03499038 0.03481976 0.03569723 0.03370576 0.03553736 0.03370263 0.03508908 0.03459526 0.03352932 0.03359375 0.03516258 0.03341000 0.03455110 0.03546748
6 0.10449683 0.10922486 0.11113457 0.11015977 0.08202078 0.11218035 0.11407839 0.08877344 0.10148935 0.09363220 0.09557057 0.08125575 0.11638676 0.10328255 0.10284304 0.08750877 0.09580041 0.09865412 0.09830448
pred_logit77 pred_logit78 pred_logit79 pred_logit80 pred_logit81 pred_logit82 pred_logit83 pred_logit84 pred_logit85 pred_logit86 pred_logit87 pred_logit88 pred_logit89 pred_logit90 pred_logit91 pred_logit92 pred_logit93 pred_logit94 pred_logit95
1 0.03249916 0.03041233 0.03498574 0.03290439 0.03478074 0.03438494 0.03336764 0.03246492 0.03326585 0.03190305 0.03202012 0.03250477 0.03399285 0.03116250 0.03135435 0.03424789 0.03195650 0.03105678 0.03420177
2 0.10497714 0.09719571 0.10098616 0.10786736 0.10273591 0.09415642 0.10645524 0.09472521 0.10185731 0.10172508 0.09365286 0.11085653 0.10146840 0.10922842 0.09976176 0.10094709 0.10993496 0.10127617 0.10917262
3 0.03992427 0.03803762 0.04289231 0.04047922 0.03804201 0.03894480 0.04044895 0.04017884 0.04123533 0.03870504 0.03883663 0.04095661 0.03635607 0.03845381 0.03805486 0.03961099 0.03850332 0.03918992 0.04199156
4 0.05457890 0.05362412 0.05949923 0.05311953 0.05836490 0.05608415 0.05659303 0.05459948 0.05555296 0.05015334 0.05651478 0.05336711 0.05441401 0.05240477 0.05806933 0.05634334 0.05555237 0.05660495 0.05588053
5 0.03429647 0.03197207 0.03619542 0.03411702 0.03570971 0.03563696 0.03604631 0.03442522 0.03476191 0.03512364 0.03340416 0.03448610 0.03651155 0.03319157 0.03256544 0.03586235 0.03329931 0.03261598 0.03529162
6 0.10790672 0.08836210 0.10407717 0.11362830 0.09496434 0.10244357 0.10032091 0.08606739 0.09756716 0.10194297 0.10562473 0.11622582 0.11133252 0.10953969 0.10916164 0.09391923 0.09493934 0.10767729 0.10698543
pred_logit96 pred_logit97 pred_logit98 pred_logit99 pred_logit100
1 0.03387045 0.03201353 0.03254618 0.03191574 0.03402187
2 0.09834642 0.09809528 0.10959218 0.10183175 0.11145276
3 0.03807393 0.03711101 0.03766931 0.04138363 0.04016083
4 0.05106565 0.05453202 0.05631773 0.05590196 0.05193316
5 0.03673516 0.03421272 0.03402775 0.03376708 0.03560627
6 0.09826648 0.10141694 0.09562299 0.10427162 0.10647115
id pred_val_logit1 pred_val_logit2 pred_val_logit3 pred_val_logit4 pred_val_logit5 pred_val_logit6 pred_val_logit7 pred_val_logit8 pred_val_logit9 pred_val_logit10 pred_val_logit11 pred_val_logit12 pred_val_logit13 pred_val_logit14
1 2 0.02759774 0.02653257 0.02592432 0.02901585 0.02513120 0.02690952 0.02668875 0.02544153 0.02569974 0.02432149 0.02541437 0.02723609 0.02539422 0.02521562
2 3 0.09111374 0.09641764 0.08868128 0.09652013 0.09411716 0.09048192 0.08921644 0.10098606 0.09812846 0.09558083 0.09315198 0.09690039 0.09178538 0.10029707
3 5 0.03026197 0.02894857 0.02804265 0.03171018 0.02739197 0.02940621 0.02920810 0.02778903 0.02776422 0.02633426 0.02767885 0.02966796 0.02747774 0.02730645
4 7 0.03829394 0.03735933 0.03929685 0.03775026 0.03928853 0.03744354 0.03679787 0.03704020 0.03669559 0.03755830 0.03927954 0.03562245 0.03652858 0.03806908
5 9 0.01656117 0.01637529 0.01678601 0.01773840 0.01560045 0.01646173 0.01619318 0.01560729 0.01675975 0.01566653 0.01584487 0.01696067 0.01641542 0.01622749
6 12 0.16479032 0.17491747 0.15325816 0.17307200 0.16948392 0.16228148 0.16100501 0.18570568 0.16739866 0.16903918 0.16677853 0.17156884 0.16017410 0.17724028
pred_val_logit15 pred_val_logit16 pred_val_logit17 pred_val_logit18 pred_val_logit19 pred_val_logit20 pred_val_logit21 pred_val_logit22 pred_val_logit23 pred_val_logit24 pred_val_logit25 pred_val_logit26 pred_val_logit27 pred_val_logit28
1 0.02603507 0.02520490 0.02607309 0.02670999 0.02584512 0.02725129 0.02692115 0.02385406 0.02573763 0.02809716 0.02683537 0.02659113 0.02806887 0.02621418
2 0.09250969 0.09078995 0.09931813 0.09159129 0.09625132 0.09886830 0.09537941 0.09495421 0.09352262 0.09009546 0.09618876 0.09237743 0.09125510 0.09339953
3 0.02846351 0.02734043 0.02831724 0.02926695 0.02794213 0.02972612 0.02939194 0.02588179 0.02813455 0.03061357 0.02941739 0.02897368 0.03052580 0.02860769
4 0.03719906 0.03946598 0.04030331 0.03887606 0.03633248 0.04099162 0.04041535 0.03566616 0.03631079 0.04230056 0.03820943 0.03814748 0.04038047 0.03982826
5 0.01588851 0.01607070 0.01650840 0.01609644 0.01678442 0.01683816 0.01655324 0.01518997 0.01571937 0.01747066 0.01613175 0.01653405 0.01763537 0.01615895
6 0.16897403 0.15902319 0.17577362 0.16752954 0.16724283 0.17559624 0.17152222 0.16874544 0.17061313 0.15694210 0.17704038 0.16561242 0.16095239 0.17128860
pred_val_logit29 pred_val_logit30 pred_val_logit31 pred_val_logit32 pred_val_logit33 pred_val_logit34 pred_val_logit35 pred_val_logit36 pred_val_logit37 pred_val_logit38 pred_val_logit39 pred_val_logit40 pred_val_logit41 pred_val_logit42
1 0.02589145 0.02527844 0.02559415 0.02647957 0.02498216 0.02526849 0.02486961 0.02521807 0.02686725 0.02858792 0.02576532 0.02511644 0.02635743 0.02466858
2 0.09450185 0.09406676 0.09757778 0.09570921 0.10180600 0.09347764 0.09818317 0.09702917 0.09258334 0.09397680 0.09031206 0.09232412 0.09228510 0.09849298
3 0.02829379 0.02749083 0.02774262 0.02878090 0.02715404 0.02733060 0.02717432 0.02754531 0.02950438 0.03114110 0.02806710 0.02746611 0.02891045 0.02680636
4 0.03791380 0.03680339 0.03860805 0.03774001 0.03663196 0.03519350 0.03739010 0.03688589 0.04001703 0.04009325 0.03891325 0.03813092 0.03933393 0.03605063
5 0.01584073 0.01588839 0.01638389 0.01669315 0.01574966 0.01637129 0.01522642 0.01546954 0.01599264 0.01779760 0.01604380 0.01530831 0.01579369 0.01557466
6 0.17064483 0.16533539 0.16941479 0.17027888 0.18143994 0.16315384 0.18034472 0.17648411 0.17010348 0.16567110 0.16116818 0.16713797 0.17125619 0.17573971
pred_val_logit43 pred_val_logit44 pred_val_logit45 pred_val_logit46 pred_val_logit47 pred_val_logit48 pred_val_logit49 pred_val_logit50 pred_val_logit51 pred_val_logit52 pred_val_logit53 pred_val_logit54 pred_val_logit55 pred_val_logit56
1 0.02462894 0.02527370 0.02644689 0.02509911 0.02526560 0.02343864 0.02678836 0.02547982 0.02725433 0.02575460 0.02541963 0.02580650 0.02679188 0.02475314
2 0.09915443 0.08898816 0.09952825 0.09863602 0.09263159 0.09767316 0.09964583 0.09416684 0.09498714 0.08885775 0.10033983 0.10010213 0.09506608 0.09721506
3 0.02666163 0.02741264 0.02882806 0.02711183 0.02758382 0.02538318 0.02916057 0.02766146 0.02965202 0.02802212 0.02750762 0.02816285 0.02924135 0.02670308
4 0.03528385 0.03883800 0.03864637 0.03843512 0.03753485 0.03523880 0.03835942 0.04057133 0.03949249 0.03801312 0.03585738 0.03691574 0.03898527 0.03621946
5 0.01588235 0.01612235 0.01640802 0.01638081 0.01554058 0.01508330 0.01674562 0.01617105 0.01708651 0.01614351 0.01642388 0.01590832 0.01650375 0.01627379
6 0.17698752 0.15613714 0.17669755 0.17339236 0.16939450 0.17474853 0.18129838 0.16714475 0.16669875 0.15794476 0.17472082 0.18163965 0.17082856 0.16848740
pred_val_logit57 pred_val_logit58 pred_val_logit59 pred_val_logit60 pred_val_logit61 pred_val_logit62 pred_val_logit63 pred_val_logit64 pred_val_logit65 pred_val_logit66 pred_val_logit67 pred_val_logit68 pred_val_logit69 pred_val_logit70
1 0.02702610 0.02706494 0.02586453 0.02736635 0.02367230 0.02544525 0.02798181 0.02697175 0.02684938 0.02760112 0.02738007 0.02717443 0.02640728 0.02617523
2 0.10337838 0.10056054 0.09472380 0.09653117 0.09582537 0.09455336 0.09509809 0.09441269 0.09703425 0.09107222 0.08948152 0.09538840 0.09711261 0.09470238
3 0.02939049 0.02950523 0.02811672 0.02967498 0.02556913 0.02749184 0.03046650 0.02934867 0.02908170 0.03005768 0.03023400 0.02940999 0.02877409 0.02864630
4 0.03784527 0.03682297 0.03690412 0.03865927 0.03732352 0.03749912 0.03956412 0.03973772 0.03904585 0.04174608 0.03565071 0.03943480 0.03653900 0.03733063
5 0.01698639 0.01677916 0.01629266 0.01747749 0.01545679 0.01658559 0.01746739 0.01689684 0.01725592 0.01721241 0.01580226 0.01754303 0.01641778 0.01588214
6 0.18711365 0.17800769 0.16990950 0.16567872 0.16730355 0.16469060 0.16634631 0.16690184 0.16929105 0.15831627 0.17181111 0.16105742 0.17458059 0.17404548
pred_val_logit71 pred_val_logit72 pred_val_logit73 pred_val_logit74 pred_val_logit75 pred_val_logit76 pred_val_logit77 pred_val_logit78 pred_val_logit79 pred_val_logit80 pred_val_logit81 pred_val_logit82 pred_val_logit83 pred_val_logit84
1 0.02534706 0.02614126 0.02650569 0.02628417 0.02811154 0.02728276 0.02610767 0.02447063 0.02864844 0.02677790 0.02816921 0.02759987 0.02654903 0.02588478
2 0.09371033 0.09383981 0.09406251 0.09437780 0.09126139 0.09822592 0.09547927 0.09094428 0.09714566 0.09395570 0.09180520 0.09614294 0.09742997 0.09844532
3 0.02770306 0.02853647 0.02890286 0.02865750 0.03084131 0.02973833 0.02835655 0.02660795 0.03112818 0.02915371 0.03089488 0.03026349 0.02863592 0.02815253
4 0.04048608 0.03702278 0.03812753 0.03995226 0.03798555 0.03777180 0.03789818 0.03706459 0.04126830 0.03646870 0.03882226 0.03732269 0.04055178 0.03768820
5 0.01549576 0.01608789 0.01641050 0.01628582 0.01681971 0.01692753 0.01652453 0.01539619 0.01808845 0.01672462 0.01688362 0.01656494 0.01746673 0.01626095
6 0.17118481 0.16831873 0.17009484 0.16737214 0.16534943 0.17516855 0.16823028 0.16124045 0.16866623 0.16566414 0.16822182 0.17658535 0.16778388 0.17511906
pred_val_logit85 pred_val_logit86 pred_val_logit87 pred_val_logit88 pred_val_logit89 pred_val_logit90 pred_val_logit91 pred_val_logit92 pred_val_logit93 pred_val_logit94 pred_val_logit95 pred_val_logit96 pred_val_logit97 pred_val_logit98
1 0.02694283 0.02482226 0.02577203 0.02619499 0.02649842 0.02491550 0.02547709 0.02720430 0.02587456 0.02497728 0.02774405 0.02651665 0.02525391 0.02614577
2 0.09650649 0.09979967 0.09375603 0.09579760 0.09683264 0.09269506 0.08888298 0.10056058 0.09084447 0.09391203 0.09934767 0.09796332 0.09444933 0.09157657
3 0.02928866 0.02681775 0.02812756 0.02833005 0.02895971 0.02700002 0.02773503 0.02983139 0.02817159 0.02717942 0.03032801 0.02877610 0.02750534 0.02853108
4 0.03842741 0.03562655 0.03843691 0.03775355 0.03674671 0.03675177 0.03996911 0.03749599 0.03817544 0.03902355 0.03769740 0.03554859 0.03741343 0.03832052
5 0.01697056 0.01618390 0.01587981 0.01697877 0.01620255 0.01597357 0.01592253 0.01632344 0.01615784 0.01564793 0.01693971 0.01686286 0.01574116 0.01612308
6 0.16940636 0.17590429 0.16858345 0.16523978 0.17955437 0.16265949 0.15742081 0.18557190 0.16125555 0.16674144 0.17799121 0.17548667 0.17110941 0.16518095
pred_val_logit99 pred_val_logit100
1 0.02552892 0.02736826
2 0.10013984 0.09670041
3 0.02775203 0.02982992
4 0.03869260 0.03545315
5 0.01608141 0.01698553
6 0.17698601 0.17269543
Trainging predcition
# A tibble: 6 x 2
id mailto_wave2
<int> <lgl>
1 1 TRUE
2 4 TRUE
3 6 TRUE
4 8 TRUE
5 10 TRUE
6 11 TRUE
Validation prediction
# A tibble: 6 x 2
id rfm_resp
<int> <dbl>
1 2 0.0462
2 3 0.0693
3 5 0.0505
4 7 0.0248
5 9 0.0205
6 12 0.144
Logistic regression (GLM)
Data : train
Response variable : res1
Level : Yes in res1
Explanatory variables: VI, numords, last, version1, owntaxprod, upgraded
Null hyp.: there is no effect of x on res1
Alt. hyp.: there is an effect of x on res1
OR coefficient std.error z.value p.value
(Intercept) -3.776 0.065 -58.142 < .001 ***
VI|TRUE 20.517 3.021 0.065 46.461 < .001 ***
numords 1.339 0.292 0.016 18.291 < .001 ***
last 0.955 -0.046 0.002 -18.390 < .001 ***
version1 2.258 0.815 0.054 15.094 < .001 ***
owntaxprod 1.384 0.325 0.104 3.114 0.002 **
upgraded 2.844 1.045 0.052 20.130 < .001 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Pseudo R-squared: 0.144
Log-likelihood: -8594.701, AIC: 17203.403, BIC: 17265.483
Chi-squared: 2900.241 df(6), p.value < .001
Nr obs: 52,500
id pred_logit1 pred_logit2 pred_logit3 pred_logit4 pred_logit5 pred_logit6 pred_logit7 pred_logit8 pred_logit9 pred_logit10 pred_logit11 pred_logit12 pred_logit13 pred_logit14 pred_logit15 pred_logit16 pred_logit17 pred_logit18 pred_logit19
1 1 0.03255364 0.03240355 0.03092583 0.03463464 0.03040415 0.03174203 0.03175792 0.03124129 0.03044998 0.02963682 0.03045392 0.03271192 0.03116484 0.03142775 0.03195773 0.03050697 0.03192694 0.03218207 0.03153167
2 4 0.01393851 0.01401335 0.01454437 0.01514301 0.01320250 0.01375657 0.01376026 0.01301222 0.01435986 0.01332014 0.01335863 0.01449886 0.01419049 0.01412621 0.01371698 0.01398539 0.01440709 0.01359692 0.01442638
3 6 0.03814915 0.03699220 0.03810736 0.04023595 0.03728228 0.03729778 0.03720487 0.03695550 0.04190864 0.03717999 0.03709756 0.04101236 0.03702016 0.03871725 0.03664923 0.03825281 0.04077682 0.03711319 0.03904759
4 8 0.05648616 0.05335497 0.05423993 0.05588224 0.05768392 0.05438269 0.05425305 0.05364427 0.05087367 0.05293504 0.05728093 0.05070596 0.05081839 0.05384368 0.05391388 0.05659069 0.05668984 0.05677823 0.05130205
5 10 0.03327651 0.03488965 0.03312637 0.03603131 0.03246416 0.03322403 0.03276011 0.03410024 0.03269297 0.03255651 0.03246666 0.03444978 0.03372893 0.03446618 0.03388593 0.03216208 0.03437519 0.03359114 0.03402267
6 11 0.12272292 0.09577005 0.10122000 0.10696755 0.10044470 0.09221386 0.12026857 0.10281456 0.10543116 0.10830848 0.10217258 0.09022829 0.10074825 0.10805934 0.11138847 0.11211335 0.12340388 0.11607161 0.10309759
pred_logit20 pred_logit21 pred_logit22 pred_logit23 pred_logit24 pred_logit25 pred_logit26 pred_logit27 pred_logit28 pred_logit29 pred_logit30 pred_logit31 pred_logit32 pred_logit33 pred_logit34 pred_logit35 pred_logit36 pred_logit37 pred_logit38
1 0.03267424 0.03210105 0.02976113 0.03070489 0.03272348 0.03264040 0.03212925 0.03364088 0.03207686 0.03116829 0.03020921 0.03073130 0.03204190 0.03112251 0.03082857 0.03033653 0.03091250 0.03206412 0.03399127
2 0.01446669 0.01394543 0.01318786 0.01324514 0.01501470 0.01353336 0.01416853 0.01531217 0.01374225 0.01340300 0.01353882 0.01412609 0.01434087 0.01371995 0.01418725 0.01279332 0.01319363 0.01356185 0.01527558
3 0.04104581 0.03810078 0.03728099 0.03717154 0.04078873 0.03740328 0.03730861 0.03841201 0.03550854 0.03806347 0.03937327 0.04117434 0.03864258 0.04040203 0.03786978 0.03707552 0.03776407 0.03777169 0.03982723
4 0.05976752 0.05831353 0.05080449 0.05336553 0.06018963 0.05630978 0.05530007 0.05675401 0.05795000 0.05596003 0.05277002 0.05450258 0.05354097 0.05223923 0.04898748 0.05463592 0.05410816 0.06046478 0.05836965
5 0.03414583 0.03390866 0.03214724 0.03258623 0.03382566 0.03428680 0.03401703 0.03554044 0.03481380 0.03277584 0.03171653 0.03258348 0.03458244 0.03345297 0.03351954 0.03269668 0.03274367 0.03302482 0.03531157
6 0.10484066 0.12047642 0.12159323 0.12166348 0.11242614 0.11408340 0.10776763 0.09682027 0.10566713 0.10478649 0.10064345 0.11177541 0.10394838 0.11424608 0.09316799 0.11223053 0.10558148 0.09607321 0.10647512
pred_logit39 pred_logit40 pred_logit41 pred_logit42 pred_logit43 pred_logit44 pred_logit45 pred_logit46 pred_logit47 pred_logit48 pred_logit49 pred_logit50 pred_logit51 pred_logit52 pred_logit53 pred_logit54 pred_logit55 pred_logit56 pred_logit57
1 0.03057902 0.03055819 0.03192174 0.03013440 0.03091519 0.03036634 0.03197299 0.03110502 0.03102441 0.02915558 0.03299573 0.03133582 0.03270896 0.03093975 0.03113669 0.03132713 0.03253076 0.03000903 0.03379470
2 0.01363854 0.01297960 0.01330433 0.01318713 0.01381187 0.01387616 0.01397276 0.01404199 0.01323651 0.01302435 0.01420253 0.01382792 0.01469055 0.01402098 0.01430976 0.01330259 0.01408001 0.01387808 0.01462789
3 0.03722147 0.03705584 0.03589393 0.03845190 0.03728343 0.03681912 0.04069130 0.03792476 0.03562937 0.03644764 0.03713452 0.03741184 0.04072233 0.03697326 0.04028314 0.03857753 0.03831198 0.03846544 0.03812340
4 0.05476449 0.05535544 0.05704535 0.05087557 0.04980768 0.05455333 0.05621945 0.05252418 0.05441405 0.04881927 0.05448597 0.05744287 0.05632451 0.05466705 0.05065130 0.05312560 0.05717498 0.04920402 0.05335240
5 0.03255787 0.03205948 0.03383415 0.03282929 0.03429171 0.03243594 0.03362534 0.03470729 0.03330895 0.03260170 0.03629898 0.03373054 0.03418233 0.03288189 0.03372696 0.03375762 0.03422116 0.03347288 0.03707727
6 0.11095212 0.11704866 0.11646868 0.09522535 0.10152700 0.12579685 0.08893513 0.10140605 0.09986982 0.10557321 0.08983256 0.09577795 0.12924931 0.08761580 0.09401225 0.10582771 0.11061621 0.09280589 0.10017519
pred_logit58 pred_logit59 pred_logit60 pred_logit61 pred_logit62 pred_logit63 pred_logit64 pred_logit65 pred_logit66 pred_logit67 pred_logit68 pred_logit69 pred_logit70 pred_logit71 pred_logit72 pred_logit73 pred_logit74 pred_logit75 pred_logit76
1 0.03218559 0.03137801 0.03224830 0.02932107 0.03113869 0.03335024 0.03231157 0.03219600 0.03235116 0.03390787 0.03179985 0.03173849 0.03138105 0.03112604 0.03167976 0.03173558 0.03110440 0.03330764 0.03257801
2 0.01435784 0.01393667 0.01512904 0.01362730 0.01429369 0.01505847 0.01445477 0.01476971 0.01474433 0.01344113 0.01512599 0.01390100 0.01329616 0.01320791 0.01364182 0.01384583 0.01389605 0.01427589 0.01438258
3 0.04240281 0.03684324 0.04293484 0.03847534 0.03815842 0.04138927 0.03913427 0.03974480 0.04022264 0.03340294 0.04290110 0.03895599 0.03690273 0.03697595 0.03779577 0.03793932 0.03999126 0.03870426 0.03961727
4 0.05252751 0.05213909 0.05422207 0.05136432 0.05113407 0.05600253 0.05640534 0.05483633 0.05958305 0.05514408 0.05497517 0.05287187 0.05486904 0.05883691 0.05456605 0.05552353 0.05776721 0.05608644 0.05429508
5 0.03386879 0.03402000 0.03365368 0.03223331 0.03415019 0.03456639 0.03420928 0.03487294 0.03330755 0.03513506 0.03322253 0.03393931 0.03342985 0.03295669 0.03308169 0.03396226 0.03267802 0.03409333 0.03475203
6 0.10588279 0.11238167 0.11355496 0.11294664 0.08726085 0.11652259 0.11738844 0.09053004 0.10551809 0.09640673 0.09655124 0.08417999 0.11923615 0.10740516 0.10446648 0.08938351 0.09952107 0.10229919 0.09777881
pred_logit77 pred_logit78 pred_logit79 pred_logit80 pred_logit81 pred_logit82 pred_logit83 pred_logit84 pred_logit85 pred_logit86 pred_logit87 pred_logit88 pred_logit89 pred_logit90 pred_logit91 pred_logit92 pred_logit93 pred_logit94 pred_logit95
1 0.03158935 0.02955313 0.03384498 0.03246340 0.03359654 0.03297347 0.03243152 0.03118429 0.03309471 0.03121815 0.03091185 0.03150197 0.03311466 0.03036638 0.03062421 0.03317167 0.03069998 0.02991445 0.03277872
2 0.01405137 0.01334398 0.01534502 0.01446092 0.01429857 0.01384465 0.01521351 0.01391887 0.01468819 0.01386821 0.01350647 0.01457146 0.01372975 0.01389626 0.01378076 0.01377409 0.01365176 0.01345066 0.01407796
3 0.03863067 0.03767656 0.04204080 0.03972448 0.03716032 0.03825928 0.03966671 0.03972823 0.03995393 0.03766683 0.03825741 0.04007000 0.03569482 0.03821361 0.03770855 0.03824742 0.03776903 0.03880653 0.04120025
4 0.05370648 0.05215010 0.05858310 0.05266057 0.05803350 0.05526674 0.05577052 0.05320518 0.05644756 0.04978288 0.05527917 0.05263581 0.05537448 0.05040052 0.05719794 0.05569063 0.05566087 0.05533026 0.05420850
5 0.03368397 0.03159272 0.03536704 0.03400896 0.03482074 0.03461193 0.03548195 0.03353132 0.03441424 0.03452680 0.03266346 0.03375083 0.03557046 0.03287240 0.03217057 0.03515489 0.03230330 0.03184725 0.03444563
6 0.11148721 0.08942446 0.10421740 0.11570420 0.09760876 0.10456413 0.10077464 0.08844413 0.10292706 0.11046367 0.11101268 0.12451204 0.11476834 0.11388365 0.11061751 0.09958070 0.09864478 0.10988191 0.10807692
pred_logit96 pred_logit97 pred_logit98 pred_logit99 pred_logit100
1 0.03282671 0.03069240 0.03155149 0.03059981 0.03313133
2 0.01460924 0.01327535 0.01361640 0.01368814 0.01464690
3 0.03733080 0.03600613 0.03658927 0.04095401 0.03985862
4 0.04964071 0.05356156 0.05622536 0.05463704 0.05177865
5 0.03625531 0.03333129 0.03335821 0.03286440 0.03485856
6 0.10064428 0.10626658 0.09826718 0.10553530 0.11135773
id pred_val_logit1 pred_val_logit2 pred_val_logit3 pred_val_logit4 pred_val_logit5 pred_val_logit6 pred_val_logit7 pred_val_logit8 pred_val_logit9 pred_val_logit10 pred_val_logit11 pred_val_logit12 pred_val_logit13 pred_val_logit14
1 2 0.02625721 0.02516754 0.02459044 0.02769663 0.02378860 0.02520981 0.02550576 0.02380040 0.02418359 0.02286748 0.02391252 0.02599389 0.02439953 0.02428223
2 3 0.09406669 0.09944895 0.09328765 0.10017989 0.09814632 0.09496535 0.09228711 0.10549943 0.10213754 0.09962365 0.09646790 0.10277638 0.09547382 0.10328065
3 5 0.02892089 0.02752161 0.02664544 0.03036893 0.02602751 0.02765230 0.02802509 0.02609985 0.02618923 0.02483709 0.02613697 0.02841706 0.02650642 0.02637795
4 7 0.03687153 0.03594319 0.03808032 0.03718767 0.03884859 0.03615130 0.03578288 0.03571288 0.03577802 0.03678510 0.03874139 0.03415940 0.03523739 0.03737997
5 9 0.01537223 0.01534052 0.01577341 0.01662479 0.01445984 0.01510658 0.01513745 0.01428463 0.01556378 0.01447961 0.01461601 0.01586791 0.01542977 0.01535919
6 12 0.17556151 0.18481967 0.16430473 0.18388950 0.18174035 0.17618557 0.17130228 0.20021617 0.17825923 0.18064884 0.17764056 0.18666085 0.17193871 0.18674077
pred_val_logit15 pred_val_logit16 pred_val_logit17 pred_val_logit18 pred_val_logit19 pred_val_logit20 pred_val_logit21 pred_val_logit22 pred_val_logit23 pred_val_logit24 pred_val_logit25 pred_val_logit26 pred_val_logit27 pred_val_logit28
1 0.02507262 0.02441075 0.02504433 0.02546352 0.02477239 0.02611225 0.02533007 0.02318727 0.02410014 0.02659251 0.02558810 0.02539557 0.02683301 0.02470677
2 0.09655236 0.09321918 0.10450487 0.09642753 0.09938478 0.10153589 0.09830088 0.09822797 0.09744244 0.09560100 0.10009642 0.09539130 0.09479190 0.09796894
3 0.02749118 0.02657917 0.02725170 0.02802237 0.02690584 0.02857566 0.02774673 0.02527578 0.02640736 0.02901732 0.02820023 0.02776240 0.02923248 0.02702241
4 0.03589776 0.03891672 0.03908647 0.03722341 0.03560631 0.04020452 0.03905017 0.03471152 0.03564631 0.04100912 0.03667057 0.03732617 0.03889856 0.03907826
5 0.01505707 0.01524219 0.01569200 0.01498162 0.01568314 0.01584934 0.01529296 0.01438897 0.01452863 0.01640159 0.01493375 0.01550566 0.01669910 0.01504609
6 0.18041406 0.16671340 0.18776777 0.18174701 0.17772427 0.18453587 0.18177917 0.17963806 0.18123314 0.17015658 0.19061303 0.17513057 0.17040326 0.18369326
pred_val_logit29 pred_val_logit30 pred_val_logit31 pred_val_logit32 pred_val_logit33 pred_val_logit34 pred_val_logit35 pred_val_logit36 pred_val_logit37 pred_val_logit38 pred_val_logit39 pred_val_logit40 pred_val_logit41 pred_val_logit42
1 0.02462013 0.02412204 0.02449808 0.02503988 0.02431182 0.02410964 0.02340480 0.02423916 0.02563860 0.02738560 0.02412810 0.02411177 0.02488132 0.02325633
2 0.09831907 0.09716601 0.10068735 0.10061892 0.10558269 0.09731933 0.10214199 0.09958704 0.09558126 0.09632697 0.09500524 0.09658025 0.09735189 0.10395009
3 0.02701565 0.02634712 0.02664812 0.02726520 0.02653229 0.02614505 0.02566711 0.02659851 0.02825528 0.02993725 0.02632538 0.02650336 0.02737645 0.02536268
4 0.03716804 0.03574591 0.03761723 0.03676687 0.03550879 0.03422170 0.03638529 0.03591507 0.03943983 0.03941007 0.03730031 0.03648640 0.03744397 0.03468897
5 0.01472358 0.01480248 0.01538020 0.01563060 0.01498789 0.01539820 0.01404467 0.01449381 0.01496490 0.01671828 0.01489514 0.01428318 0.01465595 0.01439494
6 0.18288596 0.17577169 0.17874682 0.18250705 0.19242759 0.17379703 0.19244966 0.18614815 0.17979908 0.17404132 0.17316296 0.18136497 0.18543567 0.19098434
pred_val_logit43 pred_val_logit44 pred_val_logit45 pred_val_logit46 pred_val_logit47 pred_val_logit48 pred_val_logit49 pred_val_logit50 pred_val_logit51 pred_val_logit52 pred_val_logit53 pred_val_logit54 pred_val_logit55 pred_val_logit56
1 0.02364355 0.02403589 0.02534964 0.02373755 0.02407277 0.02215790 0.02517823 0.02445729 0.02622694 0.02453822 0.02441424 0.02421247 0.02572925 0.02303592
2 0.10233164 0.09237258 0.10311460 0.10405512 0.09668289 0.10153286 0.10427609 0.09843630 0.09940567 0.09267169 0.10257091 0.10532697 0.09845568 0.10286751
3 0.02566833 0.02614090 0.02776315 0.02572118 0.02637545 0.02403363 0.02747908 0.02668335 0.02865336 0.02672869 0.02649110 0.02653165 0.02820947 0.02489206
4 0.03463672 0.03768843 0.03765013 0.03686841 0.03637227 0.03409584 0.03704319 0.03916239 0.03812348 0.03750088 0.03530483 0.03547409 0.03811208 0.03493836
5 0.01500763 0.01510518 0.01532012 0.01522805 0.01451807 0.01413808 0.01551636 0.01510149 0.01606728 0.01528732 0.01554076 0.01459240 0.01545509 0.01500765
6 0.18622437 0.16601368 0.18892382 0.18791435 0.18164788 0.18500889 0.19417653 0.18051970 0.17925362 0.16738468 0.18242898 0.19699190 0.18241835 0.18284731
pred_val_logit57 pred_val_logit58 pred_val_logit59 pred_val_logit60 pred_val_logit61 pred_val_logit62 pred_val_logit63 pred_val_logit64 pred_val_logit65 pred_val_logit66 pred_val_logit67 pred_val_logit68 pred_val_logit69 pred_val_logit70
1 0.02587622 0.02562911 0.02439129 0.02614894 0.02278813 0.02417629 0.02694554 0.02562564 0.02523568 0.02632933 0.02658592 0.02584577 0.02485499 0.02445751
2 0.10582880 0.10518945 0.09769565 0.09987400 0.09969962 0.09982497 0.09896774 0.09839731 0.10153307 0.09426046 0.09181260 0.09864595 0.10178324 0.09919868
3 0.02823107 0.02800002 0.02656885 0.02842816 0.02465225 0.02619851 0.02944812 0.02796711 0.02738797 0.02876620 0.02949970 0.02804992 0.02716172 0.02684293
4 0.03630412 0.03551741 0.03573765 0.03748764 0.03631614 0.03585918 0.03783429 0.03836760 0.03822634 0.04036622 0.03480509 0.03831675 0.03572262 0.03639727
5 0.01597588 0.01570287 0.01519542 0.01646397 0.01475336 0.01550242 0.01647572 0.01579207 0.01604407 0.01612672 0.01493636 0.01643137 0.01520697 0.01460928
6 0.19602338 0.18973348 0.17868436 0.17463717 0.17666563 0.17844655 0.17770739 0.17839837 0.18104570 0.16862524 0.18154665 0.17102289 0.18687890 0.18660804
pred_val_logit71 pred_val_logit72 pred_val_logit73 pred_val_logit74 pred_val_logit75 pred_val_logit76 pred_val_logit77 pred_val_logit78 pred_val_logit79 pred_val_logit80 pred_val_logit81 pred_val_logit82 pred_val_logit83 pred_val_logit84
1 0.02438140 0.02516797 0.02481553 0.02480673 0.02684468 0.02560864 0.02488180 0.02331207 0.02720007 0.02593105 0.02677487 0.02593961 0.02535573 0.02442389
2 0.09813283 0.09634442 0.09980836 0.09907536 0.09553926 0.10221478 0.09896969 0.09558748 0.10155173 0.09831294 0.09472502 0.10089334 0.10101565 0.10277533
3 0.02677385 0.02763456 0.02712846 0.02710248 0.02955939 0.02796698 0.02715106 0.02538703 0.02968355 0.02834926 0.02946388 0.02854792 0.02741447 0.02661490
4 0.03899230 0.03611161 0.03748821 0.03913814 0.03664257 0.03679770 0.03653512 0.03580028 0.03986320 0.03549590 0.03805021 0.03619544 0.03954621 0.03641650
5 0.01452027 0.01499617 0.01515221 0.01519780 0.01574012 0.01572378 0.01534843 0.01454491 0.01676474 0.01582686 0.01575486 0.01525582 0.01646267 0.01518217
6 0.18442021 0.17923081 0.18409804 0.17938908 0.17802562 0.18658136 0.18018546 0.17269518 0.18167675 0.17850907 0.17785814 0.19035779 0.17774865 0.18623461
pred_val_logit85 pred_val_logit86 pred_val_logit87 pred_val_logit88 pred_val_logit89 pred_val_logit90 pred_val_logit91 pred_val_logit92 pred_val_logit93 pred_val_logit94 pred_val_logit95 pred_val_logit96 pred_val_logit97 pred_val_logit98
1 0.02656484 0.02389774 0.02441706 0.02494816 0.02551652 0.02379734 0.02446619 0.02583690 0.02442808 0.02364952 0.02590009 0.02516777 0.02368097 0.02484040
2 0.09791331 0.10335694 0.09811185 0.09946537 0.09980647 0.09787047 0.09304765 0.10374066 0.09455829 0.09796628 0.10598354 0.10218443 0.09834685 0.09536657
3 0.02907935 0.02597025 0.02672811 0.02708456 0.02804814 0.02583673 0.02670839 0.02843989 0.02669860 0.02577938 0.02842577 0.02734846 0.02587037 0.02722871
4 0.03787116 0.03446162 0.03710769 0.03661856 0.03647192 0.03504478 0.03889275 0.03645175 0.03762834 0.03785548 0.03592999 0.03435529 0.03626736 0.03751941
5 0.01609710 0.01508437 0.01480051 0.01583369 0.01511008 0.01510027 0.01505881 0.01518073 0.01493598 0.01467583 0.01546910 0.01588999 0.01451701 0.01494209
6 0.17814289 0.18858073 0.18066943 0.17633195 0.19078597 0.17522708 0.16851138 0.19693506 0.17247471 0.17704313 0.19598336 0.18649614 0.18304957 0.17763050
pred_val_logit99 pred_val_logit100
1 0.02399963 0.02633093
2 0.10524509 0.10017293
3 0.02614965 0.02879715
4 0.03743954 0.03482690
5 0.01492853 0.01603679
6 0.18992416 0.18269133
Trainging predcition
# A tibble: 6 x 2
id prob_log_lbA
<int> <dbl>
1 1 0.0311
2 4 0.0936
3 6 0.0371
4 8 0.0508
5 10 0.0326
6 11 0.0875
Validation prediction
# A tibble: 6 x 2
id prob_log_lbA
<int> <dbl>
1 2 0.0246
2 3 0.0895
3 5 0.0267
4 7 0.0356
5 9 0.0155
6 12 0.158
Training
[1] “Based on our analysis, the number of customers Intuit should mail is 32,014 that is 60.98% of the customers. The response rate for the selected customers is predicted to be 6.78%, or, 2,171 buyers; while the actual response rate is 4.76%, or, 2,498. The predicted margin is $130,260.00; while actual margin is $149,880.00. The expected profit is $85,120. The messaging cost is estimated to be $45,140 with a ROME of 1.89.”
Validation
[1] “Based on our analysis, the number of customers Intuit should mail is 13,902 that is 61.79% of the customers. The response rate for the selected customers is predicted to be 6.91%, or, 961 buyers; while the actual response rate is 4.90%, or, 1,103. The predicted margin is $57,660.00; while actual margin is $66,180.00. The expected profit is $38,058. The messaging cost is estimated to be $19,602 with a ROME of 1.94.”
Training
[1] “Based on our analysis, the number of customers Intuit should mail is 30,293 that is 57.70% of the customers. The response rate for the selected customers is predicted to be 7.05%, or, 2,137 buyers; while the actual response rate is 4.76%, or, 2,498. The predicted margin is $128,220.00; while actual margin is $149,880.00. The expected profit is $85,507. The messaging cost is estimated to be $42,713 with a ROME of 2.00.”
Validation [1] “Based on our analysis, the number of customers Intuit should mail is 13,131 that is 58.36% of the customers. The response rate for the selected customers is predicted to be 7.20%, or, 946 buyers; while the actual response rate is 4.90%, or, 1,103. The predicted margin is $56,760.00; while actual margin is $66,180.00. The expected profit is $38,245. The messaging cost is estimated to be $18,515 with a ROME of 2.07.”
Created with Laplace = 1 to avoid situation where the predictor predict 0 on unseen event
Naive Bayes Classifier
Data : train
Response variable : res1
Levels : Yes, No in res1
Explanatory variables: VI, numords, dollars, last, version1, owntaxprod, upgraded
Laplace : 1
Nr obs : 52,500
A-priori probabilities:
res1
Yes No
0.048 0.952
Conditional probabilities (categorical) or means & st.dev (numeric):
VI
res1 FALSE TRUE
Yes 0.787 0.213
No 0.984 0.016
numords
res1 mean st.dev
Yes 2.568 1.436
No 2.046 1.224
dollars
res1 mean st.dev
Yes 117.000 103.241
No 91.467 79.422
last
res1 mean st.dev
Yes 12.022 8.942
No 16.048 9.536
version1
res1 mean st.dev
Yes 0.285 0.451
No 0.210 0.407
owntaxprod
res1 mean st.dev
Yes 0.049 0.216
No 0.028 0.164
upgraded
res1 mean st.dev
Yes 0.335 0.472
No 0.201 0.401
Training Prediction
Naive Bayes Classifier
Data : train
Response variable : res1
Level(s) : Yes, No in res1
Explanatory variables: VI, numords, dollars, last, version1, owntaxprod, upgraded
Prediction dataset : train
Rows shown : 10 of 52,500
VI numords dollars last version1 owntaxprod upgraded Yes No
FALSE 2 109.500 5 0 0 0 0.018 0.982
FALSE 1 22.000 17 0 0 0 0.009 0.991
FALSE 1 20.000 17 0 0 1 0.026 0.974
FALSE 1 24.500 4 1 0 0 0.029 0.971
FALSE 3 73.500 10 0 0 0 0.019 0.981
FALSE 2 99.500 7 0 1 1 0.993 0.007
FALSE 1 49.500 22 0 0 0 0.006 0.994
FALSE 1 52.000 22 0 0 0 0.006 0.994
FALSE 1 69.500 27 0 0 0 0.005 0.995
FALSE 4 264.500 15 0 0 1 0.246 0.754
Validation Prediction
Naive Bayes Classifier
Data : train
Response variable : res1
Level(s) : Yes, No in res1
Explanatory variables: VI, numords, dollars, last, version1, owntaxprod, upgraded
Prediction dataset : val
Rows shown : 10 of 22,500
VI numords dollars last version1 owntaxprod upgraded Yes No
FALSE 1 69.500 4 0 0 0 0.014 0.986
FALSE 4 93.000 14 0 0 1 0.079 0.921
FALSE 1 24.500 2 0 0 0 0.016 0.984
FALSE 1 49.500 13 1 0 0 0.020 0.980
FALSE 1 44.500 15 0 0 0 0.009 0.991
FALSE 5 79.000 5 0 0 1 0.196 0.804
FALSE 1 38.000 5 1 0 0 0.027 0.973
FALSE 2 40.500 10 0 0 0 0.013 0.987
FALSE 2 105.500 9 0 0 0 0.015 0.985
FALSE 2 136.000 27 0 0 0 0.007 0.993
Training
[1] “Based on our analysis, the number of customers Intuit should mail is 21,502 that is 40.96% of the customers. The response rate for the selected customers is predicted to be 8.86%, or, 1,905 buyers; while the actual response rate is 4.76%, or, 2,498. The predicted margin is $114,300.00; while actual margin is $149,880.00. The expected profit is $83,982. The messaging cost is estimated to be $30,318 with a ROME of 2.77.”
Validation
[1] “Based on our analysis, the number of customers Intuit should mail is 9,276 that is 41.23% of the customers. The response rate for the selected customers is predicted to be 9.10%, or, 844 buyers; while the actual response rate is 4.90%, or, 1,103. The predicted margin is $50,640.00; while actual margin is $66,180.00. The expected profit is $37,561. The messaging cost is estimated to be $13,079 with a ROME of 2.87.”
Due to the distinct ability of Nerual Network model to capture the complexity of the relationship and interactions between features and response variable, we decide to first feed the model with all the important features to see if the importance of variables deduced by Nerual Network model conforms to our conclusions made before.
Neural Network
Activation function : Logistic (classification)
Data : intuit75k
Filter : train <- training == 1
Response variable : res1
Level : Yes in res1
Explanatory variables: numords, dollars, last, sincepurch, version1, owntaxprod, upgraded, zip_one, VI
Network size : 2
Parameter decay : 0.5
Seed : 1234
Network : 9-2-1 with 23 weights
Nr obs : 52,500
Weights :
b->h1 i1->h1 i2->h1 i3->h1 i4->h1 i5->h1 i6->h1 i7->h1 i8->h1 i9->h1
1.29 -0.27 -0.20 0.52 0.02 -1.78 -0.08 -0.70 -0.10 -2.68
b->h2 i1->h2 i2->h2 i3->h2 i4->h2 i5->h2 i6->h2 i7->h2 i8->h2 i9->h2
-1.40 -1.23 0.01 1.35 -0.13 2.11 -0.40 -1.54 0.32 -2.15
b->o h1->o h2->o
0.89 -4.74 -3.01
The Olden plot shows that “dollars”, “last”, “sincepurch” and “zip_one” are relatively less important factor to consider when predicting purchase probabilities. This is aligned with our findings so we’ll build the Nerual Network model using the same variables as in logistics. We’ll use the same method to get the 5th lower-bound prediction as the purchase probability to lable customers as in the Logistic Regression models.
id trnn1 trnn2 trnn3 trnn4 trnn5 trnn6 trnn7 trnn8 trnn9 trnn10 trnn11 trnn12 trnn13 trnn14 trnn15 trnn16 trnn17 trnn18 trnn19 trnn20 trnn21 trnn22
1 1 0.03056845 0.03555882 0.03154480 0.03482594 0.02942755 0.03239208 0.03254379 0.02994816 0.02858426 0.03096297 0.03101345 0.03153061 0.02942230 0.03164288 0.03284576 0.02820377 0.03308329 0.03203990 0.03368561 0.03264220 0.03278554 0.03094488
2 4 0.01515693 0.01518377 0.01474121 0.01396486 0.01303368 0.01414603 0.01290520 0.01307065 0.01459073 0.01488323 0.01280862 0.01470844 0.01417473 0.01486062 0.01432369 0.01446864 0.01477383 0.01498237 0.01339287 0.01549035 0.01305473 0.01428503
3 6 0.04432302 0.04820826 0.04861266 0.04552652 0.05177538 0.04584605 0.04786150 0.04886100 0.05465430 0.04458509 0.04427276 0.04783514 0.05490525 0.05248175 0.04403792 0.05114751 0.05127221 0.04499102 0.04972441 0.05358336 0.05198805 0.04658951
4 8 0.05217575 0.04573376 0.04513357 0.04716904 0.05027743 0.04461655 0.04683238 0.04457962 0.04057391 0.04529554 0.04965399 0.04261595 0.04409941 0.04218243 0.04525841 0.04422806 0.04335192 0.04971193 0.04082573 0.04968094 0.05065133 0.04058117
5 10 0.03122532 0.03727811 0.03312804 0.03796901 0.03004342 0.03385746 0.03371564 0.03209299 0.02817365 0.03273792 0.03347122 0.03280904 0.02653537 0.03123775 0.03450424 0.02656418 0.03436497 0.03285720 0.03604917 0.03210879 0.03543793 0.03236554
6 11 0.10946139 0.07517193 0.10491860 0.10493253 0.08805213 0.07990543 0.09772434 0.09777240 0.08910189 0.11520451 0.11446325 0.10545570 0.07556250 0.09178360 0.10593985 0.10708437 0.13693278 0.11054531 0.10047697 0.08460489 0.10608465 0.12365925
trnn23 trnn24 trnn25 trnn26 trnn27 trnn28 trnn29 trnn30 trnn31 trnn32 trnn33 trnn34 trnn35 trnn36 trnn37 trnn38 trnn39 trnn40 trnn41 trnn42 trnn43 trnn44
1 0.03077010 0.03258638 0.03402939 0.03164322 0.03366385 0.03437379 0.02990248 0.03088144 0.03103466 0.03010043 0.03143800 0.03178932 0.03127442 0.03003255 0.03069298 0.03306565 0.02961836 0.02872022 0.03188145 0.03217048 0.03278700 0.02708618
2 0.01588395 0.01508675 0.01476833 0.01564184 0.01649764 0.01429407 0.01355014 0.01344067 0.01507583 0.01486474 0.01506338 0.01458281 0.01380003 0.01287703 0.01336259 0.01640273 0.01531111 0.01293900 0.01441258 0.01315344 0.01338322 0.01592481
3 0.04382732 0.04965266 0.04530435 0.04344983 0.04798404 0.04678445 0.05324458 0.04869014 0.05040315 0.05186434 0.04949271 0.04709242 0.04619128 0.05282603 0.04429050 0.04453908 0.04436010 0.04459817 0.04257106 0.04973371 0.04719944 0.04875797
4 0.04281945 0.05477926 0.05006558 0.04901112 0.04580933 0.04642351 0.04458683 0.04223482 0.04500079 0.04027886 0.04388732 0.04034477 0.04810500 0.04341481 0.05767093 0.04831966 0.04452878 0.05028090 0.04623357 0.04177355 0.03774349 0.04248782
5 0.03173186 0.03358742 0.03482335 0.03311864 0.03462887 0.03642484 0.02684551 0.03206193 0.03217106 0.03011240 0.03290339 0.03367820 0.03302236 0.03093952 0.03131669 0.03280250 0.03048035 0.02922359 0.03310744 0.03414386 0.03552118 0.02682099
6 0.12636775 0.10741063 0.09912223 0.10774628 0.10395524 0.10140117 0.09217720 0.09456835 0.11809586 0.10923143 0.11269842 0.09582681 0.11470331 0.08264571 0.09271192 0.13459569 0.10949712 0.10777083 0.10858145 0.08324608 0.10014981 0.15215433
trnn45 trnn46 trnn47 trnn48 trnn49 trnn50 trnn51 trnn52 trnn53 trnn54 trnn55 trnn56 trnn57 trnn58 trnn59 trnn60 trnn61 trnn62 trnn63 trnn64 trnn65 trnn66
1 0.03325865 0.03228667 0.03205828 0.03105531 0.03362680 0.03015567 0.03229598 0.02996165 0.02985849 0.02970134 0.03480507 0.02663917 0.03458925 0.03256555 0.03345142 0.03225133 0.02821758 0.03034434 0.03005434 0.03017309 0.02871995 0.02739069
2 0.01407331 0.01483875 0.01344690 0.01284776 0.01524536 0.01519733 0.01650272 0.01518879 0.01469979 0.01330812 0.01280315 0.01542449 0.01501590 0.01526331 0.01282318 0.01412470 0.01372277 0.01510995 0.01556678 0.01441936 0.01639268 0.01943881
3 0.05084926 0.04760608 0.04169290 0.04656716 0.04447221 0.04537543 0.04627230 0.04373133 0.04909093 0.05175154 0.04637782 0.04159442 0.05073475 0.05119695 0.04911400 0.05632066 0.04631622 0.05143734 0.05696204 0.05449331 0.04934853 0.04414257
4 0.04726998 0.04160888 0.04678452 0.03742572 0.04347723 0.04706759 0.05407493 0.04774190 0.04316064 0.04337522 0.04768959 0.04705801 0.04128829 0.04343140 0.04173475 0.04831426 0.04291267 0.03888782 0.04535297 0.04657168 0.04289884 0.04815002
5 0.03451934 0.03483334 0.03360170 0.03370554 0.03598966 0.03180459 0.03341011 0.03133056 0.03232990 0.03006973 0.03637840 0.03009419 0.03524951 0.03379217 0.03600006 0.03324831 0.03053359 0.03030021 0.02932835 0.02962614 0.02950076 0.02410189
6 0.09492696 0.10583276 0.10936657 0.11550684 0.09191867 0.10635957 0.13444607 0.08961214 0.11789436 0.07823643 0.11149488 0.10460627 0.08691435 0.09841285 0.10740551 0.07507597 0.13230595 0.08245547 0.10136179 0.09512566 0.08239463 0.11498935
trnn67 trnn68 trnn69 trnn70 trnn71 trnn72 trnn73 trnn74 trnn75 trnn76 trnn77 trnn78 trnn79 trnn80 trnn81 trnn82 trnn83 trnn84 trnn85 trnn86 trnn87 trnn88
1 0.03216211 0.03395770 0.03032274 0.02656046 0.03210980 0.03222284 0.03217943 0.02742064 0.03400398 0.03153937 0.03266180 0.03018257 0.03391795 0.03386751 0.03234419 0.03461222 0.03278930 0.03243118 0.03085890 0.02906777 0.03000092 0.03297784
2 0.01290343 0.01472067 0.01383532 0.01483890 0.01501553 0.01380080 0.01548497 0.01471465 0.01437018 0.01431126 0.01377155 0.01501914 0.01663954 0.01404291 0.01482780 0.01438768 0.01516077 0.01412730 0.01507713 0.01535513 0.01396821 0.01369466
3 0.04201375 0.05580790 0.05242992 0.04580444 0.04535939 0.04693676 0.04538370 0.05194032 0.04619325 0.05184223 0.04756345 0.04454057 0.04958194 0.04935086 0.04020438 0.04895863 0.04943856 0.04918562 0.05708861 0.04350526 0.04954803 0.05145199
4 0.05186781 0.04663064 0.04314027 0.04856635 0.04927909 0.04368918 0.04859993 0.04780042 0.05127405 0.04492968 0.04094500 0.04437471 0.04943441 0.04348060 0.05634499 0.04440898 0.05003490 0.04732073 0.04429168 0.04079751 0.04634148 0.04318101
5 0.03320854 0.03467903 0.02873141 0.02761221 0.03334755 0.03308418 0.03358289 0.02813574 0.03450652 0.03354310 0.03422670 0.03157600 0.03504632 0.03495900 0.02582677 0.03576083 0.03498791 0.03414037 0.02823269 0.03110486 0.03131945 0.03435295
6 0.09762724 0.08825971 0.06093197 0.10880480 0.10815741 0.09406919 0.09306372 0.08663927 0.09110655 0.11884012 0.11612300 0.08971568 0.09937172 0.12671764 0.09852102 0.09821622 0.09707290 0.08437653 0.09894354 0.11832623 0.12101217 0.14200803
trnn89 trnn90 trnn91 trnn92 trnn93 trnn94 trnn95 trnn96 trnn97 trnn98 trnn99 trnn100
1 0.03418912 0.03104064 0.03304026 0.03473297 0.03138782 0.03141967 0.03115403 0.03473169 0.03140489 0.03374033 0.02995231 0.03252547
2 0.01566914 0.01451029 0.01304637 0.01471823 0.01371036 0.01401920 0.01453998 0.01504461 0.01450837 0.01385750 0.01348681 0.01452901
3 0.04399392 0.04587868 0.05064937 0.05090705 0.04756370 0.04862836 0.05147610 0.04689068 0.04621500 0.04783325 0.05721093 0.04710874
4 0.04434810 0.04428637 0.04892955 0.04240259 0.04721870 0.04384085 0.04434793 0.04121917 0.04301554 0.04192032 0.04499316 0.04461941
5 0.03571272 0.03303706 0.03249423 0.03309621 0.03257182 0.03286076 0.03175409 0.03759508 0.03279286 0.03472803 0.03115466 0.03203346
6 0.10876140 0.11284538 0.12753597 0.09322596 0.09574514 0.10905187 0.13349757 0.10318592 0.11104747 0.10358711 0.07836963 0.12170499
id vlnn1 vlnn2 vlnn3 vlnn4 vlnn5 vlnn6 vlnn7 vlnn8 vlnn9 vlnn10 vlnn11 vlnn12 vlnn13 vlnn14 vlnn15 vlnn16 vlnn17 vlnn18 vlnn19 vlnn20 vlnn21 vlnn22
1 2 0.02628076 0.02997000 0.02582887 0.02559066 0.02402371 0.02613369 0.02594159 0.02357499 0.02512625 0.02587473 0.02367226 0.02531988 0.02776712 0.02734300 0.02656765 0.02583900 0.02675010 0.02699526 0.02645755 0.02807889 0.02492284 0.02585108
2 3 0.09271819 0.09213025 0.07877154 0.08811843 0.08614261 0.08170480 0.08221156 0.10167134 0.09365375 0.08969898 0.08323263 0.09669850 0.08772222 0.09220628 0.08897939 0.08052270 0.08847554 0.08628625 0.08568370 0.09199577 0.07992205 0.08416455
3 5 0.02810009 0.03206981 0.02780016 0.02816363 0.02639398 0.02840389 0.02858208 0.02565085 0.02740287 0.02759510 0.02594926 0.02764211 0.03103979 0.02997277 0.02881709 0.02849801 0.02899102 0.02907924 0.02878280 0.03065015 0.02742588 0.02767956
4 7 0.02341829 0.02153186 0.02428057 0.02558994 0.02695570 0.02288243 0.02384770 0.02206810 0.02521716 0.02412920 0.02828286 0.02456912 0.02591555 0.02595626 0.02322385 0.02794993 0.02450656 0.02296205 0.02412875 0.02676568 0.02634699 0.02209525
5 9 0.01675887 0.01737026 0.01623289 0.01529106 0.01428944 0.01565895 0.01444745 0.01438202 0.01578564 0.01648139 0.01406815 0.01590932 0.01547843 0.01629393 0.01591590 0.01566465 0.01633205 0.01663453 0.01508660 0.01698639 0.01443616 0.01596222
6 12 0.15542802 0.13358745 0.11398652 0.13687321 0.12406926 0.13859771 0.12802348 0.18265633 0.15907026 0.14008490 0.13232111 0.12806024 0.14856814 0.14134436 0.14005341 0.12872545 0.13398002 0.13917769 0.12008303 0.14606123 0.11033006 0.12665956
vlnn23 vlnn24 vlnn25 vlnn26 vlnn27 vlnn28 vlnn29 vlnn30 vlnn31 vlnn32 vlnn33 vlnn34 vlnn35 vlnn36 vlnn37 vlnn38 vlnn39 vlnn40 vlnn41 vlnn42 vlnn43 vlnn44
1 0.02639870 0.02769522 0.02851187 0.02683817 0.02836340 0.02782821 0.02771161 0.02518485 0.02601988 0.02597011 0.02673611 0.02610003 0.02581297 0.02419690 0.02532368 0.02802989 0.02531261 0.02349277 0.02617165 0.02629824 0.02528710 0.02390897
2 0.08635599 0.08975768 0.09578737 0.09278980 0.08220172 0.08423409 0.08955112 0.08411549 0.08728100 0.08743687 0.09740851 0.08673818 0.09248042 0.08688928 0.09372889 0.09183032 0.08714741 0.09773682 0.09025844 0.09414825 0.08807900 0.07626028
3 0.02808478 0.02967297 0.03090076 0.02858478 0.03043180 0.03010112 0.03108129 0.02727403 0.02787171 0.02834659 0.02842323 0.02799350 0.02773727 0.02662895 0.02765467 0.03046883 0.02697905 0.02565448 0.02832178 0.02835030 0.02744923 0.02566691
4 0.02186327 0.02811020 0.02413905 0.02385045 0.02387828 0.02378345 0.02551742 0.02280550 0.02394849 0.02522943 0.02342254 0.02116940 0.02455185 0.02216572 0.02810542 0.02679126 0.02302497 0.02821277 0.02229749 0.02124192 0.02213795 0.02526674
5 0.01744027 0.01690578 0.01656617 0.01729061 0.01814040 0.01614275 0.01495732 0.01499695 0.01660710 0.01614452 0.01684638 0.01620698 0.01550989 0.01416948 0.01484241 0.01774498 0.01675532 0.01422825 0.01598007 0.01499112 0.01492059 0.01686187
6 0.13718251 0.13313012 0.15944703 0.15101746 0.12211723 0.12521067 0.15247715 0.12260184 0.12892199 0.13678863 0.17671272 0.12874962 0.13620969 0.12812933 0.16308142 0.10778555 0.13934294 0.19969261 0.15177595 0.13876633 0.13693625 0.14050116
vlnn45 vlnn46 vlnn47 vlnn48 vlnn49 vlnn50 vlnn51 vlnn52 vlnn53 vlnn54 vlnn55 vlnn56 vlnn57 vlnn58 vlnn59 vlnn60 vlnn61 vlnn62 vlnn63 vlnn64 vlnn65 vlnn66
1 0.02693213 0.02592995 0.02613634 0.02380692 0.02743892 0.02448898 0.02791040 0.02535974 0.02365407 0.02444242 0.02689811 0.02123435 0.02864576 0.02729132 0.02575267 0.02620255 0.02228777 0.02637850 0.02606122 0.02580954 0.02451165 0.03046702
2 0.08783714 0.09041584 0.09216814 0.08378164 0.09665465 0.08926922 0.09885182 0.08982583 0.09490492 0.09630134 0.08586623 0.10733172 0.09510642 0.09611038 0.08414569 0.08887921 0.09059048 0.08532234 0.08661264 0.08904983 0.08926319 0.07398367
3 0.02931510 0.02782501 0.02835056 0.02585849 0.02940116 0.02641101 0.02958658 0.02700130 0.02553094 0.02692601 0.02969812 0.02237307 0.03147216 0.02928312 0.02821946 0.02867618 0.02409299 0.02869439 0.02848525 0.02835097 0.02623859 0.03291906
4 0.02457851 0.02285176 0.02691459 0.02147800 0.02304282 0.02700882 0.02753674 0.02343555 0.02505925 0.02400009 0.02848758 0.02787134 0.02331517 0.02232890 0.02345589 0.02558108 0.02548360 0.02423944 0.02701634 0.02680367 0.02611899 0.03038204
5 0.01572599 0.01637368 0.01506284 0.01425367 0.01698082 0.01628936 0.01824648 0.01667688 0.01574545 0.01452923 0.01448116 0.01617627 0.01659733 0.01694637 0.01443203 0.01555639 0.01472439 0.01643758 0.01673206 0.01567521 0.01736716 0.02070581
6 0.13064660 0.13645880 0.18466935 0.12345506 0.15384245 0.13002440 0.15837082 0.14127097 0.11823478 0.16117372 0.12696795 0.14560224 0.14921099 0.14896286 0.12059137 0.12512968 0.10795717 0.13187468 0.10515683 0.13606813 0.14966836 0.15514037
vlnn67 vlnn68 vlnn69 vlnn70 vlnn71 vlnn72 vlnn73 vlnn74 vlnn75 vlnn76 vlnn77 vlnn78 vlnn79 vlnn80 vlnn81 vlnn82 vlnn83 vlnn84 vlnn85 vlnn86 vlnn87 vlnn88
1 0.02494592 0.02888945 0.02721838 0.02227167 0.02751301 0.02654953 0.02694154 0.02292860 0.02844769 0.02530868 0.02597013 0.02548784 0.02894876 0.02776558 0.03396584 0.02908953 0.02639905 0.02692646 0.02906019 0.02367104 0.02437775 0.02657998
2 0.08299567 0.08631103 0.09234803 0.08073799 0.09003267 0.08400277 0.09102151 0.09448900 0.09214434 0.09364719 0.08962359 0.08701365 0.09644782 0.08627712 0.08781973 0.09259808 0.09306518 0.09590619 0.08499848 0.09589372 0.08753062 0.08639423
3 0.02784666 0.03107571 0.03033957 0.02406735 0.02929191 0.02882741 0.02886446 0.02476278 0.03094292 0.02743746 0.02827360 0.02714542 0.03087368 0.03016397 0.03946886 0.03129738 0.02847108 0.02891231 0.03201939 0.02530147 0.02645417 0.02890841
4 0.02648568 0.02671621 0.02648819 0.02912035 0.02368000 0.02305938 0.02275323 0.02761061 0.02411370 0.02304079 0.02158639 0.02427864 0.02744590 0.02358705 0.02931434 0.02041510 0.02827307 0.02494181 0.02633809 0.02442302 0.02448451 0.02351774
5 0.01416755 0.01670702 0.01522428 0.01562940 0.01691880 0.01549225 0.01709109 0.01565825 0.01619409 0.01561888 0.01532968 0.01654936 0.01845533 0.01586025 0.01651328 0.01651699 0.01662601 0.01595342 0.01665597 0.01637089 0.01518796 0.01537131
6 0.14200240 0.14747769 0.15087037 0.16867520 0.13922049 0.12773104 0.14432917 0.12126474 0.14737693 0.13323616 0.13709849 0.13120098 0.13948600 0.12974331 0.11767273 0.13233277 0.16437832 0.13919548 0.14608090 0.14271817 0.13041081 0.12210121
vlnn89 vlnn90 vlnn91 vlnn92 vlnn93 vlnn94 vlnn95 vlnn96 vlnn97 vlnn98 vlnn99 vlnn100
1 0.02810574 0.02567430 0.02784941 0.03025843 0.02554068 0.02570497 0.02541590 0.02813368 0.02612519 0.02838363 0.02411587 0.02782791
2 0.08608713 0.09425549 0.08059666 0.09525928 0.08197991 0.08457087 0.09825485 0.09471985 0.08527844 0.08164522 0.08928468 0.09207923
3 0.03032303 0.02744373 0.03078320 0.03354816 0.02774601 0.02773174 0.02779568 0.03016590 0.02803489 0.03053738 0.02640059 0.03061706
4 0.02210845 0.02313926 0.02619442 0.02363222 0.02409012 0.02315283 0.02419291 0.02056053 0.02253267 0.02417382 0.02390241 0.02536695
5 0.01734782 0.01611782 0.01477490 0.01646831 0.01523572 0.01559828 0.01575804 0.01693214 0.01616770 0.01598217 0.01470546 0.01605116
6 0.14470794 0.13708843 0.10672944 0.15400308 0.12625363 0.12371341 0.14369842 0.13791204 0.12862037 0.11153468 0.13186793 0.13318916
Trainging prediction
# A tibble: 6 x 2
id pred_nb
<int> <dbl>
1 1 0.0176
2 4 0.00857
3 6 0.0261
4 8 0.0289
5 10 0.0188
6 11 0.993
Validation Prediction
# A tibble: 6 x 2
id pred_nb
<int> <dbl>
1 2 0.0143
2 3 0.0793
3 5 0.0155
4 7 0.0200
5 9 0.00919
6 12 0.196
Trainging predcition
[1] “Based on our analysis, the number of customers Intuit should mail is 26,920 that is 51.28% of the customers. The response rate for the selected customers is predicted to be 7.73%, or, 2,082 buyers; while the actual response rate is 4.76%, or, 2,498. The predicted margin is $124,920.00; while actual margin is $149,880.00. The expected profit is $86,963. The messaging cost is estimated to be $37,957 with a ROME of 2.29.”
Validation predcition [1] “Based on our analysis, the number of customers Intuit should mail is 11,711 that is 52.05% of the customers. The response rate for the selected customers is predicted to be 7.76%, or, 909 buyers; while the actual response rate is 4.90%, or, 1,103. The predicted margin is $54,540.00; while actual margin is $66,180.00. The expected profit is $38,027. The messaging cost is estimated to be $16,513 with a ROME of 2.30.”
Now that we’ve already built four models, it’s time to review their performance altogether. We’ll first compare profits and ROME for both training and validation data, then visualize the lift and gains under different models to evaluate the efficiency; lastly, we’ll compare the models’ AUC score and construct the confursion matrix.
The predicted training profit is highest under NN model, leading the second highest profit which is under Logistics model by 1.70%. However, we think the outcome might because of NN model’s strong learning ability in the training data and there could potentially be an ‘overfit’ compared to logistics model. The validation outcome can speak more on this matter.
Looking at the validation prediction result, NN is still performing well. But this time, profit under NN is surpassed by logistic model by 0.57%. Based on such result, we think logistic model can generalize the data better hence the best model to use as our final model in targeting clients.
Under both training set and validation set, Naive Nayes model outperforms all the other models in return on marketing expenditure; however, considering its performance in predicting profits, we cannot prefer this model over logistics in term of targeting customers.
From the lift chart, we see that the machine learning models are much more efficient predicting and targeting purchase than non-machine-learning model, Sequential RFM. The 3 machine-learning models almost make no differences in the lift and gains charts, but we can still see that the Logistic model and NN model are able to gain more responses than other models when targeting the same percentage of customers. It suggests that using these two models can help Intuit generate more profit with less budget exhausted.
Investigating the confusion matrix, we can figure out how logistic model surpassed NN when predicted on the validation data. As NN model may learn the pattern in the training data too well, it inherited a more strict standard to judge a customer’s purchase probability; therefore, it would give many customers lower probabilities compared to logistic models, who instead generalized the data and judge the customers more liberal. Due to this reason, NN has lower FP and higher TNR while LR has higher FP but lower TNR. In our case, the mail cost is extremely low compared to the high margin, so it costs Intuit virtually nothing to send out a bit more mail but costs a lot if missed a potential buyer.
Based on the confusion matrix, we reconsolidated our confidence that LR model should be the best one to use in the final prediction.
Confusion matrix
Data : new_intuit
Filter : training == 1
Results for: Both
Predictors : rfm_resp, prob_log_lbB, pred_nb, prob_nn_lb1
Response : res1
Level : Yes in res1
Cost:Margin: 1.41 : 60
Type Predictor TP FP TN FN total TPR TNR precision Fscore
Training rfm_resp 2,312 40,530 9,472 186 52,500 0.926 0.189 0.054 0.102
Training prob_log_lbB 2,137 28,156 21,846 361 52,500 0.855 0.437 0.071 0.130
Training pred_nb 1,905 19,597 30,405 593 52,500 0.763 0.608 0.089 0.159
Training prob_nn_lb1 2,082 24,838 25,164 416 52,500 0.833 0.503 0.077 0.142
Test rfm_resp 1,022 17,166 4,231 81 22,500 0.927 0.198 0.056 0.106
Test prob_log_lbB 946 12,185 9,212 157 22,500 0.858 0.431 0.072 0.133
Test pred_nb 844 8,432 12,965 259 22,500 0.765 0.606 0.091 0.163
Test prob_nn_lb1 909 10,802 10,595 194 22,500 0.824 0.495 0.078 0.142
Type Predictor accuracy kappa profit index ROME contact AUC
Training rfm_resp 0.224 0.013 78,313 0.901 1.296 0.816 0.664
Training prob_log_lbB 0.457 0.047 85,507 0.983 2.002 0.577 0.765
Training pred_nb 0.615 0.080 83,982 0.966 2.770 0.410 0.746
Training prob_nn_lb1 0.519 0.060 86,963 1.000 2.291 0.513 0.772
Test rfm_resp 0.233 0.015 35,675 0.933 1.391 0.808 0.680
Test prob_log_lbB 0.451 0.047 38,245 1.000 2.066 0.584 0.764
Test pred_nb 0.614 0.082 37,561 0.982 2.872 0.412 0.743
Test prob_nn_lb1 0.511 0.057 38,027 0.994 2.303 0.520 0.765
As we know that in Wave2, customers will be less likely to response than in Wave1. To make sure we have picked the best model, we will re-evaluate the 4 models using a higher cut-off to lable whether they’ll response or not. The new cut-off is double the break-even response rate used in Part I~III to reflect the 50% reduction of response rate. The new cutoff is 0.05.
revised profitable rfm-id
[1] "142" "354" "354" "154" "224" "232" "453" "452" "552" "311"
Training [1] “Based on our analysis, the number of customers Intuit should mail is 52,500 that is 100.00% of the customers. The response rate for the selected customers is predicted to be 4.76%, or, 2,498 buyers; while the actual response rate is 4.76%, or, 2,498. The predicted margin is $149,880.00; while actual margin is $149,880.00. The expected profit is $75,855. The messaging cost is estimated to be $74,025 with a ROME of 1.02.”
Validation [1] “Based on our analysis, the number of customers Intuit should mail is 22,500 that is 100.00% of the customers. The response rate for the selected customers is predicted to be 4.90%, or, 1,103 buyers; while the actual response rate is 4.90%, or, 1,103. The predicted margin is $66,180.00; while actual margin is $66,180.00. The expected profit is $34,455. The messaging cost is estimated to be $31,725 with a ROME of 1.09.”
To be prudent, we re-do the process in previous part to see if “VI” is indeed a better predictor than “zip_one”.
Training
[1] “Based on our analysis, the number of customers Intuit should mail is 14,812 that is 28.21% of the customers. The response rate for the selected customers is predicted to be 10.82%, or, 1,602 buyers; while the actual response rate is 4.76%, or, 2,498. The predicted margin is $96,120.00; while actual margin is $149,880.00. The expected profit is $75,235. The messaging cost is estimated to be $20,885 with a ROME of 3.60.”
Validation
[1] “Based on our analysis, the number of customers Intuit should mail is 6,413 that is 28.50% of the customers. The response rate for the selected customers is predicted to be 11.40%, or, 731 buyers; while the actual response rate is 4.90%, or, 1,103. The predicted margin is $43,860.00; while actual margin is $66,180.00. The expected profit is $34,818. The messaging cost is estimated to be $9,042 with a ROME of 3.85.”
Training
[1] “Based on our analysis, the number of customers Intuit should mail is 13,505 that is 25.72% of the customers. The response rate for the selected customers is predicted to be 11.66%, or, 1,575 buyers; while the actual response rate is 4.76%, or, 2,498. The predicted margin is $94,500.00; while actual margin is $149,880.00. The expected profit is $75,458. The messaging cost is estimated to be $19,042 with a ROME of 3.96.”
Validation
[1] “Based on our analysis, the number of customers Intuit should mail is 5,867 that is 26.08% of the customers. The response rate for the selected customers is predicted to be 12.25%, or, 719 buyers; while the actual response rate is 4.90%, or, 1,103. The predicted margin is $43,140.00; while actual margin is $66,180.00. The expected profit is $34,868. The messaging cost is estimated to be $8,272 with a ROME of 4.21.”
From the profit comparison chart we can see that “VI” is a better predictor in the logistic model to provide higher profit than “zip_one”, even when response rate decrease. This time, even ROME is much higher using “VI” as the predictor to target customers.
Training
[1] “Based on our analysis, the number of customers Intuit should mail is 11,092 that is 21.13% of the customers. The response rate for the selected customers is predicted to be 12.52%, or, 1,389 buyers; while the actual response rate is 4.76%, or, 2,498. The predicted margin is $83,340.00; while actual margin is $149,880.00. The expected profit is $67,700. The messaging cost is estimated to be $15,640 with a ROME of 4.33.”
Validation
[1] “Based on our analysis, the number of customers Intuit should mail is 4,836 that is 21.49% of the customers. The response rate for the selected customers is predicted to be 12.76%, or, 617 buyers; while the actual response rate is 4.90%, or, 1,103. The predicted margin is $37,020.00; while actual margin is $66,180.00. The expected profit is $30,201. The messaging cost is estimated to be $6,819 with a ROME of 4.43.”
Training
[1] “Based on our analysis, the number of customers Intuit should mail is 11,873 that is 22.62% of the customers. The response rate for the selected customers is predicted to be 12.70%, or, 1,508 buyers; while the actual response rate is 4.76%, or, 2,498. The predicted margin is $90,480.00; while actual margin is $149,880.00. The expected profit is $73,739. The messaging cost is estimated to be $16,741 with a ROME of 4.40.”
validation
[1] “Based on our analysis, the number of customers Intuit should mail is 5,201 that is 23.12% of the customers. The response rate for the selected customers is predicted to be 12.98%, or, 675 buyers; while the actual response rate is 4.90%, or, 1,103. The predicted margin is $40,500.00; while actual margin is $66,180.00. The expected profit is $33,167. The messaging cost is estimated to be $7,333 with a ROME of 4.52.”
Let’s review again the performance altogether. We’ll first compare profits and ROME for both training and validation data, then visualize the lift and gains under different models to evaluate the efficiency; lastly, we’ll compare the models’ AUC score and construct the confursion matrix.
The predicted highest training profit is now under sequential RFM model, leading the second highest profit which is under logistic model by 0.53%. But what we really care about is the performance in the validation set. The predicted highest validation profit is still under logistic model.
From the lift chart, we’re now able to see a distinct advantage of logistic and NN model over other models. Sequential RFM. Also in the gains charts, the Logistic model and NN model are able to gain more purchases than other models when targeting the same proportion of customers. It suggests that using these two models can help Intuit generate more profit with less budget exhausted.
Investigating the confusion matrix at the new cut-off, we can still see the similar prediction pattern of how logistic model surpassed NN when predicted on the validation data.
Based on the confusion matrix, we reconsolidated our confidence that LR model is the best one to use in the final prediction.
Confusion matrix
Data : new_intuit2
Filter : training == 1
Results for: Both
Predictors : rfm_resp, prob_log_lbB, pred_nb, prob_nn_lb1
Response : res1
Level : Yes in res1
Cost:Margin: 1.41 : 60
Type Predictor TP FP TN FN total TPR TNR precision Fscore
Training rfm_resp 2,312 40,530 9,472 186 52,500 0.926 0.189 0.054 0.102
Training prob_log_lbB 2,137 28,156 21,846 361 52,500 0.855 0.437 0.071 0.130
Training pred_nb 1,905 19,597 30,405 593 52,500 0.763 0.608 0.089 0.159
Training prob_nn_lb1 2,082 24,838 25,164 416 52,500 0.833 0.503 0.077 0.142
Test rfm_resp 1,022 17,166 4,231 81 22,500 0.927 0.198 0.056 0.106
Test prob_log_lbB 946 12,185 9,212 157 22,500 0.858 0.431 0.072 0.133
Test pred_nb 844 8,432 12,965 259 22,500 0.765 0.606 0.091 0.163
Test prob_nn_lb1 909 10,802 10,595 194 22,500 0.824 0.495 0.078 0.142
Type Predictor accuracy kappa profit index ROME contact AUC
Training rfm_resp 0.224 0.013 78,313 0.901 1.296 0.816 0.664
Training prob_log_lbB 0.457 0.047 85,507 0.983 2.002 0.577 0.765
Training pred_nb 0.615 0.080 83,982 0.966 2.770 0.410 0.746
Training prob_nn_lb1 0.519 0.060 86,963 1.000 2.291 0.513 0.772
Test rfm_resp 0.233 0.015 35,675 0.933 1.391 0.808 0.680
Test prob_log_lbB 0.451 0.047 38,245 1.000 2.066 0.584 0.764
Test pred_nb 0.614 0.082 37,561 0.982 2.872 0.412 0.743
Test prob_nn_lb1 0.511 0.057 38,027 0.994 2.303 0.520 0.765
Based on the exploratory data analysis and the modeling process, we decide to use the prediction result of Logistic Regression B from PART VI as the guideline to target customers.
Intuit has total customer of 801,821, of which 38,487 has responded in Wave one, leaving Wave 2 total un-responded customer of 763,334. In the validation set, we have 21,397 un-responded customer and we’ll mail to 4,764 of them, or, 22.26% of all un-responded customer in the validation set. That means, scaling to the full set of un-responded customers, we will mailto 169,955 of them. The predicted validation profit is $34,867.53, that is $5.94 per wave2 mailto customer estimated by our best model. Therefore, multipling it by the projected total mail-to customer of 169,955, we get the scaled total profit of $1,010,039.99.
First, looking at the results from our best model, all the features are statistically significant. “VI”, “numords”, “version1”, “owntaxprod” as well as “upgraded”, can significantly affect customers’ response probability. To be more specific, we think each feature can impact the response in the following ways:
For each additional order made from intuit Direct, a customer’s odds of response increase by 33.9%;
For each additional customer whose current Quickbooks is version1, a customer’s odd of response increase by 120.2%;
For each additional customer who purchased tax software, a customer’s odd of response increase by 38.4%; For each additional customer who upgraded from Quickbooks version 1 to verision 2, a customer’s odd of response increase by 125.8%;
For each additional customer who is located in Virginia, a customer’s odd of response increase by 1951.7%.
Second, the olden plot indicates that the variables described above are important to the response probability. More clearly, “VI” shows the greatest importance in the plot and the importance level is degressive in sequence of “upgraded”, “numords”, “version 1” and “owntaxprod”.
Overall, we conclude that the businesses which are more likely to upgrade might be located in Virginia, currently use version 1 or have upgraded from version 1 to version 2, purchased tax software and may have placed a large number of orders previously.